Catalytic innovation underlies independent recruitment of polyketide synthases in cocaine and hyoscyamine biosynthesis

Tropane alkaloids such as hyoscyamine and cocaine are of importance in medicinal uses. Only recently has the hyoscyamine biosynthetic machinery become complete. However, the cocaine biosynthesis pathway remains only partially elucidated. Here we characterize polyketide synthases required for generating 3-oxo-glutaric acid from malonyl-CoA in cocaine biosynthetic route. Structural analysis shows that these two polyketide synthases adopt distinctly different active site architecture to catalyze the same reaction as pyrrolidine ketide synthase in hyoscyamine biosynthesis, revealing an unusual parallel/convergent evolution of biochemical function in homologous enzymes. Further phylogenetic analysis suggests lineage-specific acquisition of polyketide synthases required for tropane alkaloid biosynthesis in Erythroxylaceae and Solanaceae species, respectively. Overall, our work elucidates not only a key unknown step in cocaine biosynthesis pathway but also, more importantly, structural and biochemical basis for independent recruitment of polyketide synthases in tropane alkaloid biosynthesis, thus broadening the understanding of conservation and innovation of biosynthetic catalysts.

Comparisons between the established hyoscyamine (1) and cocaine (3) biosynthetic routes showed that Solanaceae and Erythroxylaceae plants use markedly different enzymes for similar biosynthetic reactions (Fig. 1b), although 1 and 3 share high chemical structure similarity (Fig. 1a). In Solanaceae, the biosynthesis of 1 and 2 employs tropinone reductases (TRs) of the short-chain dehydrogenase/reductase (SDR) enzyme family 9 and littorine synthase (LS) of the serine carboxypeptidase-like acyltransferase (SCPL-AT) family 10 for catalyzing reduction of the keto group and subsequent esterification (Fig. 1b). However, cocaine biosynthesis in Erythroxylaceae requires methylecgonone reductase (MecgoR) of the aldo-keto reductase (AKR) family 11 and cocaine synthase (CS) of the BAHD acyltransferase (benzylalcohol O-acetyl transferase, anthocyanin O-hydroxycinnamoyl transferase, anthranilate N-hydroxycinnamoyl/benzoyl transferase, and diacetyl vindoline 4-O-acetyltransferase) family 12 to catalyze the corresponding reduction and esterification steps (Fig. 1b). This substantial divergence raises several questions surrounding diversity and evolution of TA biosynthetic machineries in different plant lineages.
In this work, by combining transcriptome annotations, gene expression pattern, and in vitro and in vivo enzyme activity assays, EnPKS1 and EnPKS2 responsible for cocaine biosynthesis in Erythroxylum novogranatense are identified. Structure-function analysis of EnPKS1/2 deciphers a unique active site architecture distinct from that of AaPYKS1 which catalyzes the same OGA-forming reaction in hyoscyamine biosynthesis. Phylogenetic tree analysis and active site residues exchange assay suggest that EnPKS1/2 and AaPYKS1 evolve independently in Solanaceae and Erythroxylaceae, two distant TAproducing plant lineages. Our work illustrates an unusual case of independent catalytic innovation, providing a fascinating arena for understanding the biochemical conservation and evolution in trajectories leading to chemotypic convergence in phylogenetically distant plants.

Identification of EnPKS1/2 in cocaine biosynthesis pathway
We identified seven candidates predicted PKS genes (EnPKS1-7) based on annotations in the transcriptomic sequencing data of E. novogranatense. Among them, EnPKS1-3 exhibited the characteristic bud and young leaf-predominant expression pattern as that of two other enzyme genes (MecgoR and CS) well-established to function in cocaine biosynthesis ( Supplementary Fig. 1). Meanwhile, EnPKS3-5 showed the highest amino acid sequence identities (68-71%) with the OGAforming PKS (AaPYKS1) in hyoscyamine biosynthetic pathway (Supplementary Table 1). However, subsequent searches based on active site conservation were unsuccessful, since none of the seven EnPKSs possessed the signature active site R134, which has been suggested to be critical for one-round malonyl-CoA condensation to produce OGA in AaPYKS1 16 . We nevertheless succeeded in cloning these seven potential PKS genes from E. novogranatense cDNA library and expressed them in Escherichia coli ( Supplementary Fig. 2) for further in vitro functional analysis. Notably, liquid chromatography-mass spectrometry (LC-MS) analysis showed two members of the seven EnPKSs, EnPKS1 and EnPKS2, both showed catalytic activity in the formation of OGA (7) from malonyl-CoA ( Fig. 2a and Supplementary  Figs. 3 and 4a), the same as that of PYKSs in solanaceous plants. In the presence of N-methyl-Δ 1 -pyrrolinium (5), the resulting OGA (7) was further consumed to form 4-(1-methyl-2-pyrrolidinyl)-3-oxobutanoic acid (6) by condensation ( Fig. 2b and Supplementary Fig. 4b). Moreover, by feeding deuterium-labeled intermediate  ] N-methyl-Δ 1pyrrolinium, transient expression of EnPKS1, EnPKS2 or AaPYKS1 in combination with tropinone synthase AbCYP82M3 15 in tobacco (Nicotiana benthamiana) leaves resulted in formation of isotopically labeled tropinone (4), supporting in vivo role of EnPKS1/2 in constructing tropane ring precursor ( Supplementary Fig. 5).
Unexpectedly, amino acid sequence alignment showed that the conserved R134 residue in PYKSs was replaced by a threonine in both EnPKS1 (T133) and EnPKS2 (T133), which was interesting because the R134T variant of AaPYKS1 showed significant decrease of OGA-forming activity in the previous study 16  Structural and biochemical basis for the OGA-forming activity in EnPKS1/2 To better understand the detailed catalytic mechanism of EnPKS1 or EnPKS2 that distinguish them from other solanaceous PYKSs, we determined the EnPKS1 and EnPKS2 crystal structures at 2.67Å and 2.62 Å resolution, respectively (EnPKS1, PDB ID: 7F0G; EnPKS2, PDB ID: 7F0E) (Supplementary Table 3). EnPKS1 and EnPKS2 showed homodimeric structures like other type III PKSs (Fig. 3a, b). Specifically, we observed a distinct catalytic pocket formed by R212, F216, I255 and S339 in monomer A, K138 in monomer B, and the type III PKSconserved C165-H304-N337 triad (Fig. 3b-d and Supplementary Fig. 7). Specific activity (% of wild type) Since efforts to co-crystallize malonyl-CoA with EnPKSs (EnPKS1 and EnPKS2) failed, we used molecular docking analysis with known AaPYKS1 catalytic intermediates 4-carboxy-3-oxobutanoyl (COB) and 4-carboxy-3-oxobutanoyl-CoA (COB-CoA) 16 in the predicted catalytic pocket of EnPKS2 ( Fig. 3c-e) to explore its potential catalytic mechanisms. As shown in Fig. 3d, F216 and I255 were predicted to govern the catalytic pocket size via hydrophobic interactions 17 . Importantly, K138 that protrudes from another monomer (Fig. 3b), in conjunction with R212 and S339 could form salt-bridge and hydrogen bonds with the carboxy group of COB/COB-CoA to fix the intermediates ( Fig. 3c-e), a function previously suggested to be mediated by R134 and S340 in AaPYKS1 16 .
To examine the role of these residues in the active site, sitedirected mutants of EnPKS2 were generated. We found that EnPKS2 catalytic activity was abolished by methionine substitution at residue K138 (K138M) to mimic conventional cross-subunit interactions within type III PKS homodimers 17,18 , and the same result was also observed for K138E mutant (Fig. 2c and Supplementary Fig. 8a). However, the EnPKS2 K138R mutant, in which arginine (R) could substitute functionally for lysine (K) in salt bridge formation, retained significant OGAforming activity ( Fig. 2c and Supplementary Fig. 8a). These results demonstrated the critical role of the salt-bridge interactions between K138 and the reaction intermediate in governing EnPKS2 activity (Fig. 3c-e). Meanwhile, large-to-small (R212A) and polar-to-nonpolar (R212L) substitutions of R212 resulted in~30% and~64% decrease in EnPKS2 activity, respectively, whereas functionally equivalent substitution (R212K) showed no difference ( Fig. 2c and Supplementary  Fig. 8b). In view of the decreased but still high catalytic activity remained in R212A and R212L mutants, further time-course study on the enzymatic reactions was conducted to obtain more quantitative information. As a result, it was found that the catalytic activity of R212A and R212L mutants relative to the wild type EnPKS2 was closely associated with substrate (malonyl-CoA) concentration and incubation time of the enzymatic reaction ( Supplementary Fig. 9). Consequently, before the wild type EnPKS2 catalyzed reactions reached equilibrium, 61.5-82.2% and 94.2-96.6% decreases could be observed in R212A and R212L mutants, respectively ( Fig. 2d and Supplementary Fig. 9). For S339, alanine substitution (S339A) did not lead to apparent decrease in EnPKS2 activity ( Fig. 2c and Supplementary Fig. 8c). These results indicated that K138 performs vital role in EnPKS2 catalysis of OGA (7) formation and R212 serves as auxiliary residue influencing catalytic performance, while S339 is not essential for the catalytic process. Accordingly, K138 together with R212 likely fixed the COB/COB-CoA carboxy group via salt-bridge interactions efficiently, leaving the hydrogen bond interactions mediated by S339 unnecessary in controlling OGA-forming activity of EnPKS2 (Fig. 3c). Additionally, a smallto-large substitution of S339 (S339L) caused the abolishment of EnPKS2 activity ( Fig. 2c and Supplementary Fig. 8c), possibly attributable to decreased space in the catalytic pocket ( Fig. 3c-e).
Plasticity of active sites recruited to control OGA-forming activity in EnPKS1/2 and PYKS Plant type III PKS is a superfamily sharing high similarity in amino acid sequence (30-95%) and overall protein structure 19 . Notably, it has been found that minor modulations of residues lining the catalytic pocket, where the conserved C-H-N catalytic triad is positioned, could result in dramatic changes in the pocket volume and shape, generating functionally different type III PKSs 19  structural basis for controlling the specific one round of chain elongation in EnPKS1/2, we compared their active-site architectures with type III PKSs pentaketide chromone synthase 20 (PCS) and octaketide synthase 21 (OKS) that catalyze four and seven rounds of chain elongation using malonyl-CoA as sole substrate, respectively (Supplementary Fig. 10). Notably, a narrow constriction defined by K138 and R212 was observed between the buried pocket and the active center (C165-H304-N337 triad) in EnPKS1/2 ( Supplementary Fig. 10). Together with the potential salt-bridge interactions between K138/R212 and the carboxy group of COB/COB-CoA mentioned above (Figs. 2c and 3), we proposed that K138 and R212 afford a narrowed catalytic tunnel and strong steric hindrance which prevent intermediate passage, thus terminating the reaction after one round of malonyl-CoA condensation.
It has been revealed that a simple modulation of the active site residues could sterically alter the catalytic pocket, resulting in dramatic changes in polyketide chain length and product specificity 19 . Therefore, we generated different active site mutants of EnPKS2 to examine whether they could synthesize longer polyketide products. It was intriguing to note that, concomitant with the loss of the OGAforming activity, triacetic acid lactone (TAL, 8, Supplementary Fig. 4c), the product expected from two rounds of malonyl-CoA extensions, was detected in the reactions catalyzed by EnPKS2 mutants K138M, K138E, and K138M/R212A ( Fig. 4 and Supplementary Fig. 11). Consistent with the suggested auxiliary role of R212 in EnPKS2-catalyzed formation of OGA, the single mutation R212A or R212L did not result in production of TAL ( Supplementary Fig. 11). Therefore, K138 is the key residue that controls polyketide chain elongation in EnPKS2. In addition, when we replaced AaPYKS1 R134 16 with alanine (R134A), TAL (8) was also found in the corresponding catalytic reaction (Fig. 4 and Supplementary Fig. 11). By superposition, we found K138 of EnPKS2 and R134 of AaPYKS1 were located at non-equivalent positions in the structural scaffold (Fig. 3f). These findings suggested that different solutions have been developed in EnPKS2 and AaPYKS1 to achieve the steric constraints during polyketide chain elongation, resulting in the same OGA-forming activity. Moreover, except TAL (8), we did not detect any other longer polyketide products, most probably attributable to a still sterically restricted environment inside the catalytic pocket ( Supplementary Fig. 10). Collectively, EnPKS1 and EnPKS2 in E. novogranatense adopt a distinct architecture by recruiting two specific amino acid residues, R212 from one monomer and K138 from another monomer to mediate the same catalytic activity as that of solanaceous PYKSs which employ R134 to limit malonyl-CoA elongation (Fig. 3f), indicating plasticity of active sites recruited to control OGA-forming activity in type III PKSs.

Lineage-specific acquisition of PKSs required for TA biosynthesis
The cocaine-producing plant family Erythroxylaceae is a clade in the Malpighiales, while hyoscyamine-producing Solanaceae is placed in the order Solanales. To further understand the evolutionary trajectories of PYKS and EnPKS1/2 in Solanaceae and Erythroxylaceae plants, respectively, we performed phylogenetic analysis of genome-wide samples of PKSs from Malpighiales and Solanales species. Intriguingly, all putative TA-producing species of Solanales analyzed here (Solanum tuberosum, Solanum lycopersicum, Capsicum annuum, Petunia inflata, Ipomoea triloba and Cuscuta campestris) have PKSs that phylogenetically group with previously characterized PYKSs 16 (AaPYKS1, AbPYKS1, DsPYKS1, the PYKS clade) (Fig. 5 and Supplementary Figs. 12 and 13). Consistent with this finding, in vitro enzymatic assays indicated that the catalytic activity in OGA (7) production was also conserved in the corresponding SlPYKS (Sly XP 004239898.1, S. lycopersicum 22 , tomato) ( Supplementary Fig. 14), thus revealing a putative element required for TA (calystegines) biosynthesis in plants 23 . Notably, no PYKS gene was retained in tobacco genome 24 (Nicotiana attenuata), which is consistent with the predominant production of nicotine alkaloid in this solanaceous plant rather than TAs ( Fig. 5 and Supplementary Fig. 13). In Malpighiales, presumed ortholog of EnPKS1/2 was only found in Kandelia obovate 25 (KoPKS, Kob GWHPACBH01626) ( Fig. 5 and Supplementary Figs. 13 and 14), a putative TA-producing species (Rhizophoraceae) 26,27 that is closely related to Erythroxylaceae plants. Moreover, sequence alignment on genome-wide samples of PKSs from Malpighiales and Solanales species revealed that K138 and R212, the active sites of EnPKS1 and EnPKS2, only appeared in KoPKS (Supplementary Fig. 12), thus suggesting that EnPKSs capable of OGA production in E. novogranatense emerged specifically in the last common ancestor of Erythroxylaceae and Rhizophoraceae plants in Malpighiales. We therefore named this clade of PKSs as Neo-PYKS since they have the same function as that of PYKSs in Solanales (Fig. 5).
There are different hypotheses regarding the origins of the variability in the functional residues that play the same role in homologous enzymes 28 . For EnPKS2 and AaPYKS1, they might evolve from the same ancestral PKS that recruited active sites independently in different phylogenetic lineages to acquire the same catalytic activity. Alternatively, parallel/convergent evolution of two ancestral PKSs with different function occurred. We then studied these two possibilities by switching the two different active site sets between EnPKS2 and AaPYKS1. Thus, two triple mutants, R134T/M139K/A213R of AaPYKS1 and T133R/K138M/R212A of EnPKS2, were generated. Interestingly, neither mutant showed the expected OGA-forming activity ( Supplementary Fig. 15), raising the possibility that EnPKS2 and AaPYKS1 independently evolved from two primordial PKSs.

Discussion
The biosynthetic route of cocaine has remained largely undetermined.
In this study, we identified EnPKSs responsible for tropane ring precursor construction in cocaine biosynthesis pathway, thus resolving a long-standing question as to whether the same set of enzymes are employed by the cocaine and hyoscyamine biosynthetic machineries. Previously, AaPYKS1 was the unusual type III PKS identified in the hyoscyamine and scopolamine biosynthesis pathway that uses a specific catalytic pocket with the conserved R134 and S340 catalytic site to produce OGA (7) 16 . Although the divergence of enzymes responsible for tropane ring decoration has been characterized between Solanaceae and Erythroxylaceae plants 2 , we were still surprised that active site-based screening failed to identify PKSs involved in synthesis of the conserved tropane skeleton in E. novogranatense. Subsequent structure-function analysis revealed that spatially non-equivalent active sites were employed by EnPKS1/2 (R212 and K138) and AaPYKS1 (R134) to mediate the specific one-round chain elongation to generate OGA (7) (Fig. 3f and Supplementary Fig. 10), indicating independent catalytic innovation events may have occurred in PKS homologs.
Previous studies have widely addressed the participation of plant type III PKSs in the formation of a multitude of diverse scaffolds for medicinally valuable plant secondary metabolites, such as cannabinoids, curcuminoids, and quinolones 19,29,30 . In addition, parallel evolution in the PKS family that confers the same catalytic innovation in disparate plant lineages has been reported, such as the repeated independent emergence of stilbene synthases which arise from chalcone synthases 30 . Generally, the same or similar steric features of the active site can be observed in parallel-evolving homologs with identical catalytic properties 31,32 . The distinct differences in the active site cavities we found between EnPKS1/2 and AaPYKS1 illustrate a rare example of structural and functional evolution of PKS homologs, suggesting that independent mechanistic mutations have occurred in the ancestral PKSs to converge on the same OGA-forming activity. Recently, two PKSs (HsPKS4 and PcPKS1) with the conserved R134 and S340 residues as that of PYKSs, which produce OGA (7) for constructing the pelletierine block of Lycopodium alkaloids, were characterized in Huperzia serrata and Phlegmariurus cryptomerianus, respectively 33,34 (Supplementary Fig. 12). Considering the potential versatility of OGA (7) in building diverse carbon skeletons via the nonenzymatic Mannich-like condensation 16,33,34 , we hypothesize that more PKSs will be identified in different secondary metabolite pathways based on the unique active sites (R212 and K138) in EnPKSs. This unique example also alerts that the conventional conserved active sites-based search for enzymes catalyzing the same reaction could be ineffective in some cases, and that genome/transcriptome data combining with structural analysis will be the potent way to explore the diverse enzymatic elements behind the metabolism diversity in plants.
Phylogenetic analyses of the PKSs from both Malpighiales and Solanales species enabled a landscape view of the origins of OGAproducing PKSs in these two phylogenetically distant families. The distinct distribution of the Neo-PYKS clade and the PYKS clade in the phylogenetic tree (Fig. 5), the spatially non-equivalent active sites mentioned above, and especially, the non-interchangeable active sites between EnPKS2 and AaPYKS1, strongly suggested that EnPKS1/2 (Neo-PYKS clade) and PYKSs (PYKS clade) originated independently from nonorthologous PKS progenitors. It means that the TA biosynthetic machineries in hyoscyamine-producing Solanaceae and cocaineproducing Erythroxylaceae probably emerged via independent recruitment of PKS homologs with divergent active site architecture that catalyze identical chemical reaction for tropane ring precursor construction. This unusual evolutionary solution to reach the same catalytic reaction underscores the expanding plasticity and adaptability of secondary metabolite catalysts. In addition, the discovery of unique PKSs in cocaine biosynthetic pathway and the insights into biochemical evolution of homologous enzymes can provide a theoretical framework for rational design of diverse protein scaffolds for synthetic biology and metabolic engineering in the innovation of specialized metabolites for medicinal and industrial applications.

Materials and experimental procedures
The reagents, solvents, and restriction enzymes were purchased from standard commercial sources and used directly. PCR amplifications were carried out on Bio-Rad T100 thermal cycler using Phanta® Super Fidelity DNA Polymerase (P505-d3, Vazyme, China). 3-oxo-glutaric acid (OGA, 7, 165115-25g) and malonyl-CoA (M4263-5mg) were purchased from sigma-aldrich Co. (USA). Triacetic acid lactone (TAL, 8) was purchased from Macklin (H811426-25g). Primer synthesis and DNA sequencing were performed by TsingKe Co. (China). LC-MS analysis was conducted on AGILENT 1290/6530 system and analyzed by Agilent MassHunter. HPLC analysis was conducted on a HITACHI Chromaster system equipped with a DAD detector, a YMC-Triart C 18 column (I.D. 4.6 mm × 250 mm, Japan), and a flow rate of 1.0 mL/min at a column temperature of 25°C. ChemBioDraw Ultra 14.0 was used for drawing chemical structures. OriginPro 9.0 was used for LC-MS and HPLC data visualization.

Enzyme assay and kinetic parameters analysis
The standard assay mixture (100 μL) which contained potassium phosphate buffer (100 mM K 2 HPO 4 /KH 2 PO 4 , pH 8.0), 0.5 mM malonyl-CoA (for detection of formation of compound 6, 1 mM 5 was also added) and 30 μg enzyme was incubated at 30°C for 1 h. The reactions were stopped by adding 10 μL 20% HCl. After centrifugation, the supernatant was used for LC-MS or HPLC analysis. The analysis was performed using water with 0.1% formic acid as solvent A and methanol with 0.1% formic acid as solvent B. The injections were eluted with 5% B for 10 min.
According to measurements of initial reaction velocity, the reaction condition for enzyme kinetic assays was determined as follows: varied malonyl-CoA concentrations (20-1000 μM) and enzyme (the final concentration of enzyme was 1 ng/μL for EnPKS1, 2 ng/μL for EnPKS2 or 1 ng/μL for AaPYKS1) in a final volume of 50 μL potassium phosphate buffer (100 mM K 2 HPO 4 /KH 2 PO 4 , pH 8.0) at 30°C (the reaction time was 5 min for EnPKS1, 8 min for EnPKS2 or 5 min for AaPYKS1). The reactions were quenched by adding 5 μL 20% HCl. Quantifications of the reaction products (OGA, 7) were performed using HPLC. Kinetic parameter values were calculated by Graphpad Prism 7 software.

Crystallization and structure determination
Crystals of EnPKS1 and EnPKS2 were grown using sitting drop vapor diffusion method. Successful crystal growth could be observed in crystallization buffer (0.2 M calcium acetate, 20% w/v PEG3350, 0.1 M Tris-HCl pH 7.0) at 18°C. Crystals diffraction data were collected from a single crystal at Shanghai Synchrotron Radiation Facility beamline 17U with a wavelength of 0.9795 Å at 100 K. The diffraction data were processed and scaled with XDS (BUILT = 20210205) 35 . The structures were solved by the molecular replacement method with structure of chalcone synthase (PDB: 6DXD). Initial model was build using Phenix 1.0 36 . Manual adjustment of the model was carried out using the program Coot-0.9.4 37 and the models were refined by Phenix 1.0 and Refmac5 38 .

Molecular docking of catalytic intermediates with EnPKS2
Rigid molecular docking was performed in Autodock 4.2 39 . The ligands 4-carboxy-3-oxobutanoyl (COB) and 4-carboxy-3-oxobutanoyl-CoA (COB-CoA) were downloaded and extracted from the PDB database (PDB ID: 6J1M and 6J1N, respectively), and docked with the binding sites of EnPKS2. The key parameters, such as grid number and algorithm, were set as default in docking, while the rotatable bond in the ligand was kept completely rigid. Finally, one hundred independent docking runs were generated and the complex structure with lower binding energy and favorable orientation was selected. PyMOL 2.4 (http://www.pymol.org) was used for viewing the molecular interaction and image processing.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The transcriptome datasets of E. novogranatense have been deposited in NCBI under accession numbers SRR15399168, SRR15399169, SRR15399170, SRR15399171, SRR15399172, SRR15399173, SRR15399174, SRR15399175, SRR15399176, SRR15399177, SRR15399178, SRR15399179, and SRR15399180. The gene sequences of EnPKS1 and EnPKS2 have been deposited in GenBank under accession numbers MZ819697 and MZ819696. The atomic models of EnPKS1 and EnPKS2 have been deposited in the Protein Data Bank under PDB IDs 7F0G and 7F0E. All data that support the findings of this study are available in the main text and the supplementary information. Source data are provided with this paper.

Material availability
All unique materials are readily available from the corresponding author on request.